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© Baseline correction for chromatography. 



© Peak detection for a chromatogram is improved 
by removing systematic errors in the chromatogram 
using orthogonal subtraction. Orthogonal subtraction 
involves subtracting from each spectrum in the 
chromatogram its expression in a spectra! space 
representing the systematic errors. The data used in 
constructing the spectral space can be obtained in 
the form of spectra in the chromatogram occurring 
between component peaks. Principal component 
analysis can be applied to obtain a series of princi- 
pal factors. A "hook method" can be applied to 
determine an optimum number of factors to use in 
constructing the spectral space, which is defined by 
the selected factors after normalization. 



CM 

w 
<o 

o 

CM 



a. 

UJ 



O O _J 3 2 2 



n 



UJO 

mo 



DC 




UJ 








Ui 








O 

ps 




o 




UJ 




Q- 




</) 





Xerox Copy Centre 



1 



EP 0 299 652 A1 



2 



BASELINE CORRECTION FOR CHROMATOGRAPHY 



BACKGROUND OF THE INVENTION 



The present invention relates to chromatog- 
raphy and. more particularly, to a system and 
method providing for mathematical correction of 
systematic baseline errors in chromatograms. 

While having a broad range of applications, the 
present invention arose in the context of liquid 
chromatography systems using ultraviolet and visi- 
ble light spectral analysis in the generation of 
chromatograms. One challenge faced by such 
chromatography systems is that systematic errors 
in the spectra, introduced by solvents and other 
sources, interfere with analysis of peak shape and 
component identification. A major objective of the 
present invention is to provide a flexible mathemat- 
ical method for minimizing the effects of such 
systematic errors in chromatographic analysis. 

Liquid chromatography typically involves sepa- 
ration of the components of a sample mixture by 
movement of a solvent mobile phase over a solid 
stationary phase in a chromatograph column. Each 
mixture component is partitioned according to a 
characteristic "partition coefficient" between the 
phases, depending on the solvent or solvent mix- 
ture in the column at the time. As the mobile phase 
moves past the stationary phase, repeated adsorp- 
tion and desorption of the component occurs at a 
rate determined chiefly by its ratio of distribution 
between the two phases. If the partition coefficients 
for the different mixture components are sufficiently 
different, the components exit the effluent end of 
the column in a series of bands which, theoreti- 
cally, can be analyzed to determine the identity 
and original concentration of each mixture compo- 
nent. 

A spectrometer can be used to analyze the 
eluting components by generating a chromatogram 
comprised of a series of spectra. Typically, a 
chromatogram is characterized by a series of 
peaks, each peak ideally representing a gradually 
rising and declining magnitude of a pure spectral 
component traceable to an individual mixture com- 
ponent. Theoretically, by comparing the detected 
spectra of a peak with known spectra for various 
compounds, the component can be identified. By 
integrating each identified component over its cor- 
responding peaks, the relative concentrations of the 
components in the original mixture can be deter- 
mined, at least in the ideal. 

Inevitably, errors in the chromatogram adverse- 
ly affect determination of component identity and 
concentration. Error can include both random and 



systematic errors, the present invention addressing 
the latter. Systematic errors include those with 
components which have a constant spectral shape 
but vary in magnitude, resembling the component i 

5 spectra themselves in this regard. Most systematic 
errors however are characterized by wider temporal 
distribution than component peaks. 

Systematic errors are introduced as a matter of 
course as raw spectral data reflect the spectra of 

to one or more solvents of the mobile phase as well 
as the mixture components. There are additional 
source? of systematic errors, including changes in 
spectral absorbance due to temperature or other 
effects, variations in spectrometer lamp intensity 

75 and color, and variations in the spectrometer detec- 
tor sensitivity. 

Systematic errors due to solvent spectral ab- 
sorbance are usually addressed by subtracting a 
spectral component attributed to one or more soi- 

20 vents from a chromatogram. This can be a rela- 
tively simple procedure where a single solvent is 
involved. However, more complex procedures use 
multiple solvents in time-varying ratios to accom- 
modate complex mixtures with components having 

25 a wide range of solvent characteristics. In these 
more complex cases, "subtraction" involves sub- 
tracting the right components in the right con- 
centrations at the right times. 

In practice it is difficult to know what solvents * 

30 are eluting in what concentrations at any given 
time. Irregularities in the pumping and mixing ap- 
paratus used to introduce solvent mixtures into the 
chromatographic column can create unintended 
transients and fluctuations prior to introduction. 

35 Some of these time-varying effects can be ad- 
dressed by subtracting blank run chromatograms. 
The solvents can be run through a column without 
the mixture so as to produce a solvent chromatog- 
ram. The solvent chromatogram can be subtracted 

40 from the chromatogram of interest to the contribu- 
tion of the solvents to the spectral data. 

However, the blank run approach does not ad- 
dress other time-varying* systematic errors or inter- 
active effects between the solvents and the mixture 

45 components. The blank-run approach is costly in • 
that a new blank run is required for each solvent * 
set up. In fact, several blank runs are needed to r 
place a confidence level on the solvent chromatog- 
ram, since variations can occur from run to run. 

so These variations constrain the extent to which 
blank run subtraction can address systematic er- 
rors in a chromatogram. In practice, even after 
correction by current methods, significant system- 
atic errors remain in chromatographic data, espe- 
cially when complex solvent systems are involved. 



2 



EP 0 299 652 A1 



4 



Another problem in determining the identity 
and relative concentrations of mixture components 
concerns the inability of a given solvent system to 
separate all mixture components. For example, if 
two or more mixture components have nearly the 5 
same partition coefficient between the mobile and 
stationary phases, they tend to elute at about the 
same time. The result is that the corresponding 
component peaks overlap. 

The problem with overlap can be addressed w 
mathematically. Simple mathematical peak-shape 
tests permit identification of chromatogram features 
representing overlapping component peaks. More 
complex mathematical procedures can be used to 
deconvolve overlapping peaks so that the identity rs 
and relative concentrations of the overlapping com- 
ponents can be determined. 

The mathematical procedures used in peak- 
shape tests and deconvolution are highly sensitive 
to systematic errors. As chromatography is applied 20 
to increasingly complex mixtures, it becomes in- ' 
creasingly difficult to resolve all mixture compo- 
nents chemically. Accordingly, it is becoming in- 
creasingly important to remove systematic errors 
from chromatographic data so that mathematical 25 
methods can supplement more effectively the 
spectral analysis of mixtures. 

SUMMARY OF THE INVENTION 30 



In accordance with the present invention, sys- 
tematic errors can be removed from a chromatog- 
ram by replacing its spectra with their components 35 
orthogonal to a spectral space representing the 
systematic errors. This method involving orthogonal 
subtraction can be used subsequent to or in place 
of "blank subtraction" or other grosser attempts to 
remove systematic errors from a chromatogram. ao 

The present invention requires the generation 
of a spectral space from spectra representative of 
systematic errors in a given chromatogram. In most 
cases, this data can be obtained by sampling a 
chromatogram between component peaks. The se- 45 
lection of appropriate samples can be done visually 
by choosing various points removed from each 
other and significant peaks. More reliable math- 
ematical methods are available, for example, based 
on the first and higher order time derivatives for the 50 
various spectra. 

A spectral space can then be defined by the 
sample spectra so selected. The preferred method 
of constructing such a space involves principal 
component analysis. The sample spectra can be 55 
reexpressed as vector sums in a principal compo- 
nent space of orthogonal factors. Principal compo- 
nent analysis yields a set of vector components in 



decreasing order of significance in characterizing 
the sample data. Preferably a "hook" method or 
other algorithm is used to determine a suitable 
number of principal factors for characterizing the 
systematic error spectra. The selected principal 
factors define the spectral space used to remove 
systematic error from the given chromatogram. 

Once the spectral space is determined, each 
spectrum of the original chromatogram is ex- 
pressed in this space. From each original spectrum 
is subtracted its expression in the spectral space to 
yield a corresponding modified spectrum. This can 
be performed for all original spectra in the original 
chromatogram or any time interval thereof. The 
modified spectra so obtained then constitute a 
modified chromatogram. 

With systematic errors thus removed, peak de- 
tection can be performed more reliably. Once a 
peak is reliably detected, the spectra constituting 
the peak can be added to provide a relatively error- 
free spectral component. This spectral component 
can be used for component identification by com- 
paring this spectral component with standard spec- 
tra modified by subtraction of the standard spec- 
trum in the same spectral space used to define the 
modified chromatogram. Alternatively, the modified 
component spectra can "resume" a more standard 
form by adding the orthogonal components sub- 
tracted in order to define the peak. Finally, math- 
ematical peak-shape detection of overlapping com- 
ponent peaks and deconvolution can be applied 
with minimal distortion due to systematic errors. 

The present invention has advantages over and 
above effective minimization of linearly additive 
systematic errors which are constant in shape but 
temporally varying in amplitude.. While the spectra 
representing the systematic errors can be taken 
from a blank run or other source, this is avoidable. 
Systematic error spectra usually can be selected 
from between component peaks as well as at the 
beginning and end of a chromatographic run. Since 
blank runs are not required, many of the disadvan- 
tages associated with blank run methods are avoid- 
ed, including the time and expense involved in 
blank runs. Additionally, errors due to inter-run vari- 
ability are eliminated. 

A related set of advantages flows from the fact 
that principal component analysis does not require 
prior knowledge of the -sources of the systematic 
errors. Whereas prior procedures required knowl- 
edge and deliberate accommodation of each sol- 
vent system, the. present invention automatically 
adapts to solvent system changes as reflected in 
the constructed spectral space. As a corollary, the 
present invention copes with systematic errors due 
to all sources, known or not, including temperature 
variations and spectrometer hardware related vari- 
ations. 
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Accordingly, the present invention provides a 
system and method incorporating orthogonal sub- 
traction and providing more effective minimization 
of systematic errors in chromatograms without prior 
blank runs or knowledge about the spectra repre- s 
senting the systematic errors. Other features and 
advantages of the present invention are apparent 
from the description below with reference to the 
following drawings. 

w 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a chromatography system in is 
accordance with the present invention. 

FIGURE 2 is a chromatogram produced by 
the system of FIG. 1 prior to base-line correction. 

FIGURE 3 is the chromatogram of FIG. 2 as 
modified by base-line correction in accordance with . 20 
the present invention. 

DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS * ~ ~ " 25 

A chromatography system 10 in accordance 
with the present invention comprises a chromatog- 
raph column 20. a spectrometer 30, a computer 40 30 
including a chromatogram generator 50 and a 
base-line correction module 60. and a display 70. A 
sample mixture is introduced into the spectrometer 
30 by means of a mobile phase carrier. The output 
of the spectrometer 30 is a series of spectra which 35 
are collected, stored and processed by the com- 
puter 40. 

The chromatogram generator 50 organizes the 
data into a chromatogram, without base-line correc- 
tion in accordance with the present invention. The 40 
chromatogram 200 of FIG. 2 is a graphic repre- 
sentation of chromatographic data organized by the 
chromatogram generator 50. After base-line correc- 
tion, the chromatographic data takes the form of 
chromatogram 300 of FIG. 3. Chromatograms 200 45 
and 300 represent wavelength-averaged temporal 
sections of more extended experimentally obtained 
chromatograms. 

Chromatogram 200 includes primary peaks ar- 
ound spectrum numbers / = 780, 890. 940, 1040. so 
and 1140. The primary peak at 780 appears to 
overlap secondary peaks, at about 760 and 800. to 
either side. The primary peak at 890 appears to 
overlap a secondary peak at about 910. Less re- 
solved features at 950 and 1050 appear to follow 55 
the primary peaks at 940 and 1040. The regions 
around spectra 850. 1000. 1 100, 1 150 appear to be 
appropriate candidates for spectra representing sol- 



vent spectra and spectra for other systematic er- 
rors rather than component spectra of interest. 
These regions can be identified visually, or using 
mathematical algorithms which, for example, exam- 
ine the first and higher order derivatives of the 
chromatogram 200. In the present case, individual 
spectrum numbers 830. 835. 975, 980. 1080. 1095, 
1100. 1105. 1170. 1175. 1180. 1225 and 1230 are 
used to represent systematic errors. 

Alternatively, clusters of spectra, rather than 
individual spectra can be used to generate the 
spectral space. Adjacent spectra can be clustered, 
and cluster averages used in the principal compo- 
nent analysis. This allows more data to be used 
and reduces the effects of random variations in 
individual spectra. 

Each systematic error spectrum R e (y) is a vec- 
tor representing a series of absorbance intensities 
Ra(M ^ each of the spectral frequencies k in the 
range of the spectrometer 30. These can be ar- 
ranged into a matrix R a , the columns of which 
represent systematic error vectors R 6 (/). and the 
rows of which correspond to individual spectral 
frequencies. Principal component analysis of this 
matrix R e yields a series of principal factors F<, F2, 
F3. ... in order of declining importance in char- 
acterizing the vectors R e (y). 

Each of the vectors R a (y) can be expressed as 
a linear sum of all the principal factors. Principal 
component analysis permits significant gains in 
computational efficiency at some cost in precision 
by permitting less significant factors to be dropped 
from further processing. Various algorithms are 
available for assigning a cutoff point in the principal 
factor series. A "hook method" selects F 0 as the 
last factor retained, when F 0 is the last F k for 
whichait/o*., > 3, where o k is the singular value 
corresponding to the vector F k . The chromatogram 
of FIG. 3 resulted from the use of three principal 
factors. 

Alternative mathematical criteria are available. 
Other methods include assigning a fixed number of 
principal factors, e.g., 3, or equating the number of 
principal factors with the number of solvents or 
number of suspected major sources of systematic 
error. Another approach is to carry out the method 
of the present invention, first using one principal 
factor, and reiterating while incrementing the num- 
ber of principal factors until the results converge, 
i.e., the baseline is substantially eliminated. 

An orthonormal set of vectors G k is then con- 
structed according to the equation G k = F k /o k . With 
respect to chromatogram 200. orthonormal vectors 
Gi, G2, G3 define a spectral space G. Alternatively, 
a spectral space of orthonormal vectors can be 
obtained in other ways. For example, a spectral 
space can be derived from the known spectra of 
the solvents by Gram-Schmidt orthogonalization. 
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Once the spectral space G is determined, each 
spectrum R(y) of chromatogram 200 can be ex- 
pressed in that space as follows: 

QUI = 5 (fi'WG(*))G<*> 
where R{J) is the transpose of R(j). s 

The spectra S(y) of chromatogram 300 of FIG. 
3 can then be obtained by subtracting from each 
R(/) Its expression in the spectral space G of sys- 
tematic errors. This orthonormai subtraction can be 
expressed as S(j) = R(7) - Q(J), where the vectors w 
S(j) constitute a matrix S, which is represented by 
chromatogram 300 of FIG. 3. 

Matrix S provides a sounder basis for peak 
detection than does the matrix R corresponding to 
chromatogram 200. Peaks can be detected from a /5 
plot of successive values of S(y), the magnitudes of 
vectors S(j), over a range of j's. Comparison of 
chromatograms 300 and 200 indicate the advan- 
tages of orthonormai substraction in delimiting 
peaks. For example, the primary peaks at 890 and 20 
940 are better resolved relative to each other. In ■ 
other words, one can more determine more readily 
where the peak centered at 890 ends and the peak 
center at 940 begins. Furthermore, the boundaries 
of the primary peaks at 1040 and 1140 are much 25 
more clearly defined. 

The minor peak preceding the primary peak 
centered at 780 is more clearly defined. Mathemat- 
ical deconvolution now can be applied more suc- 
cessfully to determine the overlapping component 30 
spectra represented by spectra numbers 755-825. 
A peak-purity test can be applied to the remaining 
peaks to determine whether deconvolution is re- 
quired elsewhere. Performing deconvolution using 
the modified spectra S(y) minimizes the likelihood 35 
that a solvent might appear as a spurious mixture 
component. 

To the extent that the pure component spectra 
for the sample mixture are orthogonal to the spec- 
tra for Systematic noise, the modified spectra cor- 40 
respond to the real spectra for the mixture compo- 
nents. Peak identification can be performed by 
correlating one of the spectra with spectra for 
known compounds. Preferably, several S(y) are 
averaged within a peak to improve signal-to-noise 45 
ratio before comparison. 

More generally, orthonormai subtraction using 

« spectral space G can be applied to standard spec- 

tra prior to correlation with the empirically deter- 

,< mined spectra to aid in component identification. so 

Alternatively, the standard spectra can be . com- 
pared unmodified with the spectrum of a 
chromatographically isolated spectrum obtained by 
summing over index / over a peak: E(S(j) + G(y)). 
This restores the original form of the peak, while 55 
taking advantage of the reduced systematic noise 
in defining the peak. Where mathematical decon- 
volution is applied, the estimated spectra of the 



pure components are linear combinations of the 
original spectra R(y). which can be reconstructed 
from the corresponding modified spectra S(y) and 
the orthonormai minuends Q{j) as in the case of 
isolated component peaks. 

The present invention provides for many modi- 
fications of the foregoing embodiments. A system- 
atic error spectral space can be constructed from 
known solvent or other spectra, or from blank runs 
or from a current run, or some combination of the 
foregoing. Where current or blank runs are used, 
there are alternative approaches to selecting repre- 
sentative spectra. Several approaches can be used 
to construct a spectral space from data, including 
principal component analysis and Gram-Schmidt 
orthogpnalization. Clustered or unclustered spectra 
can be used in these constructions. The dimen- 
sionality of the spectral space can be predeter- 
mined or determined using various criteria, in addi- 
tion, there are methods which differ in form from, 
but are mathematically equivalent to, the described 
methods. These and other modifications and vari- 
ations are provided by the present invention, the 
scope of which is limited only by the following 
claims. 



Claims 

1 . A system comprising: 

spectra means for generating a series of 

spectra; 

construction means for constructing an 
orthonormai spectral space from said series, said 
spectral space representing systematic errors in 
said series; and 

subtraction means for subtracting from each 
spectrum in said series the expression of the same 
spectrum in said spectral space. 

2. The system of Claim 1 wherein said series 
of spectra constitute a chromatogram. 

3. The system of Claim 2 further comprising 
means for identifying systematic error spectra of 
said chromatogram representing systematic errors 
in said chromatogram. 

4. The system of Claim 3 wherein said con- 
struction means includes principal component 
means for performing principal component analysis 
on said systematic error spectra. 

5. The system of Claim 4 wherein said princi- 
pal component means includes threshold means 
for determining a suitable number of principal fac- 
tors to be used in constructing said spectral space. 

6. A method comprising: 
generating a chromatogram; 

constructing a spectral space representing 
systematic errors in said chromatogram; and 
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subtracting from each spectrum in said 
chromatogram the expression of that spectrum in 
said spectra! space. 

7. The method of Claim 6 wherein said con- 
structing step includes a substep of identifying 5 
systematic error spectra of said chromatogram, 
said systematic error spectra representing system- 
atic errors in said chromatogram. 

8. The method of Claim 7 wherein said con- 
structing step includes a substep of performing w 
principal component analysis on said systematic 
error spectra. 

9. The method of Claim 8 wherein said con- 
structing step includes a substep of determining a 
suitable number of principal factors to be used in ;s 
constructing said spectral space. 
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